Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libtorch: new recipe #24759

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from
Draft

libtorch: new recipe #24759

wants to merge 17 commits into from

Conversation

valgur
Copy link
Contributor

@valgur valgur commented Jul 30, 2024

Summary

Changes to recipe: libtorch/2.4.0

Motivation

Tensors and Dynamic neural networks in Python with strong GPU acceleration.

https://github.com/pytorch/pytorch

Packaging status

Details

Continues from #5100 by @SpaceIm.

CUDA, HIP and SYCL backends are currently disabled since the PR is complex enough already and these can be addressed in a follow-up PR. Vulkan and Metal (TODO) should be usable as GPU backends currently.

Distributed feature is disabled as well to limit the scope and due to openmpi not yet being available (#18980).

Android and iOS builds are probably broken and need testing.

Non-OpenBLAS BLAS backends are probably not usable due to OpenBLAS being required for LAPACK. A separate LAPACK recipe would be required to fix that (such as #23798).

Closes #6861.

TODO:

  • Export missing CMake variables.
  • Test with Metal on macOS.
  • Submit bugfix patches upstream.
  • Create a recipe for pocketfft and unvendor.

@conan-center-bot

This comment has been minimized.

@conan-center-bot

This comment has been minimized.

XNNPACK was not correctly added to project dependencies.
Prefer namespaced targets, if possible.
@conan-center-bot

This comment has been minimized.

Copy link
Contributor

Hooks produced the following warnings for commit 87a1370
libtorch/2.4.0@#f680755600363ae5e29186ad5b798792
post_source(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './third_party/kineto/libkineto/third_party/dynolog/hbt/src/perf_event/json_events/generated/intel/sapphirerapids_uncore_experimental.cpp' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_source(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './third_party/kineto/libkineto/third_party/dynolog/third_party/googletest/googlemock/include/gmock/internal/custom/gmock-generated-actions.h' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_source(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './third_party/kineto/libkineto/third_party/dynolog/third_party/json/doc/mkdocs/docs/api/byte_container_with_subtype/byte_container_with_subtype.md' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_source(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './third_party/kineto/libkineto/third_party/dynolog/third_party/json/test/reports/2016-09-09-nativejson_benchmark/conformance_overall_Result.png' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_source(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM/testing/python3/dcptestautomation/parse_dcgmproftester_single_metric.py' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_source(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM/testing/python3/tests/nvswitch_tests/test_nvswitch_with_running_fm.py' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_source(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM/scripts/verify_package_contents/datacenter-gpu-manager_VERSION_arm64.deb.txt' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_source(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './test/dynamo_expected_failures/TestExpandedWeightFunctionalCPU.test_expanded_weights_per_sample_grad_input_no_grad_nn_functional_group_norm_cpu_float64' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_source(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './test/dynamo_skips/TestProxyTensorOpInfoCPU.test_make_fx_symbolic_exhaustive_inplace_nn_functional_feature_alpha_dropout_without_train_cpu_float32' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_package(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './include/ATen/native/transformers/cuda/mem_eff_attention/iterators/predicated_tile_iterator_residual_last.h' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_package(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './include/ATen/native/transformers/cuda/mem_eff_attention/epilogue/epilogue_thread_apply_logsumexp.h' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.
post_package(): WARN: [SHORT_PATHS USAGE (KB-H066)] The file './include/ATen/ops/max_pool2d_with_indices_backward_compositeexplicitautogradnonfunctional_dispatch.h' has a very long path and may exceed Windows max path length. Add 'short_paths = True' in your recipe.

@valgur valgur mentioned this pull request Aug 7, 2024
3 tasks
@conan-center-bot
Copy link
Collaborator

Conan v1 pipeline ❌

Failure in build 6 (ff36ad93e96684208459986e2a5088b01a02883d):

  • libtorch/2.4.0:
    An unexpected error happened and has been reported

Note: To save resources, CI tries to finish as soon as an error is found. For this reason you might find that not all the references have been launched or not all the configurations for a given reference. Also, take into account that we cannot guarantee the order of execution as it depends on CI workload and workers availability.


Conan v2 pipeline ❌

Note: Conan v2 builds are now mandatory. Please read our discussion about it.

The v2 pipeline failed. Please, review the errors and note this is required for pull requests to be merged. In case this recipe is still not ported to Conan 2.x, please, ping @conan-io/barbarians on the PR and we will help you.

Failure in build 8 (ff36ad93e96684208459986e2a5088b01a02883d):

  • libtorch/2.4.0:
    CI failed to create some packages (All logs)

    Logs for packageID 999239f19123416d584ffc8c46c1df33a363bf09:
    [settings]
    arch=armv8
    build_type=Release
    compiler=apple-clang
    compiler.cppstd=17
    compiler.libcxx=libc++
    compiler.version=13
    os=Macos
    [options]
    */*:shared=False
    
    [...]
    --   USE_NCCL              : OFF
    --   USE_NNPACK            : OFF
    --   USE_NUMPY             : OFF
    --   USE_OBSERVERS         : False
    --   USE_OPENCL            : False
    --   USE_OPENMP            : False
    --   USE_MIMALLOC          : False
    --   USE_VULKAN            : False
    --   USE_PROF              : OFF
    --   USE_PYTORCH_QNNPACK   : True
    --   USE_XNNPACK           : True
    --   USE_DISTRIBUTED       : OFF
    --   Public Dependencies  : 
    --   Private Dependencies : cpuinfo;fp16::fp16;fmt::fmt;pthreadpool::pthreadpool;flatbuffers::flatbuffers;xnnpack::xnnpack;Threads::Threads;cpuinfo;pytorch_qnnpack;fp16;onnx::onnx;foxi_loader;fmt::fmt-header-only;kineto
    --   Public CUDA Deps.    : 
    --   Private CUDA Deps.   : 
    --   USE_COREML_DELEGATE     : False
    --   BUILD_LAZY_TS_BACKEND   : True
    --   USE_ROCM_KERNEL_ASSERT : OFF
    -- Configuring done (5.6s)
    -- Generating done (0.5s)
    -- Build files have been written to: /Users/jenkins/workspace/prod-v2/bsr/75828/debae/p/b/libto3d26c80da6c4e/b/build/Release
    [  0%] Linking C static library ../../lib/libfxdiv.a
    [  0%] Built target clog
    [  0%] Built target libkineto_defs.bzl
    ar: no archive members specified
    usage:  ar -d [-TLsv] archive file ...
    	ar -m [-TLsv] archive file ...
    	ar -m [-abiTLsv] position archive file ...
    	ar -p [-TLsv] archive [file ...]
    	ar -q [-cTLsv] archive file ...
    	ar -r [-cuTLsv] archive file ...
    	ar -r [-abciuTLsv] position archive file ...
    	ar -t [-TLsv] archive [file ...]
    	ar -x [-ouTLsv] archive [file ...]
    make[2]: *** [lib/libfxdiv.a] Error 1
    make[1]: *** [confu-deps/pytorch_qnnpack/CMakeFiles/fxdiv.dir/all] Error 2
    make[1]: *** Waiting for unfinished jobs....
    [  0%] Built target kineto_api
    [  1%] Built target kineto_base
    [  8%] Built target c10
    [  8%] Built target ATEN_CPU_FILES_GEN_TARGET
    make: *** [all] Error 2
    
    libtorch/2.4.0: ERROR: 
    Package '999239f19123416d584ffc8c46c1df33a363bf09' build failed
    libtorch/2.4.0: WARN: Build folder /Users/jenkins/workspace/prod-v2/bsr/75828/debae/p/b/libto3d26c80da6c4e/b/build/Release
    ERROR: libtorch/2.4.0: Error in build() method, line 497
    	cmake.build(cli_args=["--parallel", "1"])
    	ConanException: Error 2 while executing
    

Note: To save resources, CI tries to finish as soon as an error is found. For this reason you might find that not all the references have been launched or not all the configurations for a given reference. Also, take into account that we cannot guarantee the order of execution as it depends on CI workload and workers availability.

@valgur valgur mentioned this pull request Sep 19, 2024
6 tasks
@hasB4K
Copy link

hasB4K commented Sep 26, 2024

Hello @valgur, thanks for this amazing PR. Do you plan to continue working on it? 🤞Having libtorch in Conan would be so neat. Since OpenMPI is now available, do you plan to let the user to enable the distributed feature?

tc.variables["BLAS"] = self._blas_cmake_option_value

tc.variables["MSVC_Z7_OVERRIDE"] = False

Copy link
Contributor

@keef-cognitiv keef-cognitiv Oct 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, this also needs

tc.variables["CMAKE_CXX_EXTENSIONS"] = True

Tested this while running a build that uses a compiler.cppstd. If it is using a non-gnu standard (which for other packages it must be) ATen breaks with the same error as: pytorch/QNNPACK#67

This converts -std=c++17 for example to -std=gnu++17.

It's probably not necessary on Windows but also shouldn't hurt

whole_archive = f"-WHOLEARCHIVE:{lib_fullpath}"
else:
lib_fullpath = os.path.join(lib_folder, f"lib{libname}.a")
whole_archive = f"-Wl,--whole-archive,{lib_fullpath},--no-whole-archive"
Copy link

@lia-viam lia-viam Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this PR--I am not using this library directly but found it through following some github issues on whole archive linking.

For this line--I wonder if it is possible to do this with -Wl,--push-state,--pop-state? See eg
https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_LINK_LIBRARY_USING_FEATURE.html#loading-a-whole-static-library

https://github.com/Kitware/CMake/blob/ddf1d2944fe53b0fb0be79621c53d2d235fce07b/Modules/Platform/Linker/GNU.cmake#L35

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[request] pytorch/1.9
7 participants